Goto

Collaborating Authors

 attack algorithm



Adversarial Attacks on Online Learning to Rank with Click Feedback

Neural Information Processing Systems

Online learning to rank (OLTR) is a sequential decision-making problem where a learning agent selects an ordered list of items and receives feedback through user clicks. Although potential attacks against OLTR algorithms may cause serious losses in real-world applications, there is limited knowledge about adversarial attacks on OLTR. This paper studies attack strategies against multiple variants of OLTR. Our first result provides an attack strategy against the UCB algorithm on classical stochastic bandits with binary feedback, which solves the key issues caused by bounded and discrete feedback that previous works cannot handle.


Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Kang, Haidong, Wu, Wei, Wang, Hanling

arXiv.org Artificial Intelligence

Few-shot class incremental learning (FSCIL) is a more realistic and challenging paradigm in continual learning to incrementally learn unseen classes and overcome catastrophic forgetting on base classes with only a few training examples. Previous efforts have primarily centered around studying more effective FSCIL approaches. By contrast, less attention was devoted to thinking the security issues in contributing to FSCIL. This paper aims to provide a holistic study of the impact of attacks on FSCIL. We first derive insights by systematically exploring how human expert-designed attack methods (i.e., PGD, FGSM) affect FSCIL. We find that those methods either fail to attack base classes, or suffer from huge labor costs due to relying on huge expert knowledge. This highlights the need to craft a specialized attack method for FSCIL. Grounded in these insights, in this paper, we propose a simple yet effective ACraft method to automatically steer and discover optimal attack methods targeted at FSCIL by leveraging Large Language Models (LLMs) without human experts. Moreover, to improve the reasoning between LLMs and FSCIL, we introduce a novel Proximal Policy Optimization (PPO) based reinforcement learning to optimize learning, making LLMs generate better attack methods in the next generation by establishing positive feedback. Experiments on mainstream benchmarks show that our ACraft significantly degrades the performance of state-of-the-art FSCIL methods and dramatically beyond human expert-designed attack methods while maintaining the lowest costs of attack.



Deep learning models are vulnerable, but adversarial examples are even more vulnerable

Li, Jun, Xu, Yanwei, Li, Keran, Zhang, Xiaoli

arXiv.org Artificial Intelligence

Understanding intrinsic differences between adversarial examples and clean samples is key to enhancing DNN robustness and detection against adversarial attacks. This study first empirically finds that image-based adversarial examples are notably sensitive to occlusion. Controlled experiments on CIFAR-10 used nine canonical attacks (e.g., FGSM, PGD) to generate adversarial examples, paired with original samples for evaluation. We introduce Sliding Mask Confidence Entropy (SMCE) to quantify model confidence fluctuation under occlusion. Using 1800+ test images, SMCE calculations supported by Mask Entropy Field Maps and statistical distributions show adversarial examples have significantly higher confidence volatility under occlusion than originals. Based on this, we propose Sliding Window Mask-based Adversarial Example Detection (SWM-AED), which avoids catastrophic overfitting of conventional adversarial training. Evaluations across classifiers and attacks on CIFAR-10 demonstrate robust performance, with accuracy over 62% in most cases and up to 96.5%.


AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research

Beyer, Tim, Dornbusch, Jonas, Steimle, Jakob, Ladenburger, Moritz, Schwinn, Leo, Günnemann, Stephan

arXiv.org Artificial Intelligence

The rapid expansion of research on Large Language Model (LLM) safety and robustness has produced a fragmented and oftentimes buggy ecosystem of implementations, datasets, and evaluation methods. This fragmentation makes reproducibility and comparability across studies challenging, hindering meaningful progress. To address these issues, we introduce AdversariaLLM, a toolbox for conducting LLM jailbreak robustness research. Its design centers on reproducibility, correctness, and extensibility. The framework implements twelve adversarial attack algorithms, integrates seven benchmark datasets spanning harmfulness, over-refusal, and utility evaluation, and provides access to a wide range of open-weight LLMs via Hugging Face. The implementation includes advanced features for comparability and reproducibility such as compute-resource tracking, deterministic results, and distributional evaluation techniques. \name also integrates judging through the companion package JudgeZoo, which can also be used independently. Together, these components aim to establish a robust foundation for transparent, comparable, and reproducible research in LLM safety.


Adversarial Attacks on Online Learning to Rank with Click Feedback

Neural Information Processing Systems

Although potential attacks against OL TR algorithms may cause serious losses in real-world applications, there is limited knowledge about adversarial attacks on OL TR. This paper studies attack strategies against multiple variants of OL TR.



e1c13a13fc6b87616b787b986f98a111-Supplemental.pdf

Neural Information Processing Systems

This section gives the worst-case time analysis for Algorithm 1. This gives the bound shown in Eq. 3. B.1 Loss function space L Recall that the loss function search space is defined as: (Loss Function Search Space) L::= targeted Loss, n with Z | untargeted Loss with Z | targeted Loss, n - untargeted Loss with Z Z::= logits | probs To refer to different settings, we use the following notation: U: for the untargeted loss, T: for the targeted loss, D: for the targeted untargeted loss L: for using logits, and P: for using probs Effectively, the search space includes all the possible combinations expect that the cross-entropy loss supports only probability. B.2 Attack Algorithm & Parameters Space S Recall the attack space defined as: S::= S; S | randomize S | EOT S, n | repeat S, n | try S for n | Attack with params with loss L randomize The type of every parameter is either integer or float. Generic parameters and the supported loss for each attack algorithm are defined in Table 4. B.3 Search space conditioned on network property Following Stutz et al. (2020), we use the robust test error (Rerr) metric We define robust accuracy as 1 Rerr. Note however that Rerr defined in Eq. 5 has intractable maximization problem in the denominator, Note that we use a zero knowledge detector model, so none of the attacks in the search space are aware of the detector.


Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss

Feng, Naifu, Chen, Lixing, Tang, Junhua, Ding, Hua, Li, Jianhua, Bai, Yang

arXiv.org Artificial Intelligence

Transformer - based models have made significant progress in time series forecasting. However, a key limitation of deep learning models is their susceptibility to adversarial attacks, which has not been studied enough in the context of time series prediction . In contrast to areas such as computer vision, where adversarial robustness has been extensively studied, frequency domain features of time series data play an important role in the prediction task but have not been sufficiently explored in terms of adver sarial attacks. This paper proposes a time series prediction attack algorithm based on frequency domain loss. Specifically, we adapt an attack method originally designed for classification tasks to the prediction field and optimize the adversarial samples using both time - domain and frequency - domain losses. To the best of our knowledge, there is no relevant research on using frequency information for time - series adversarial attacks. Our experimental results show that these current time series prediction mode ls are vulnerable to adversarial attacks, and our approach achieves excellent performance on major time series forecasting datasets.